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Abstract — Polar codes were recently introduced by Arikan. 
They achieve the capacity of arbitrary symmetric binary-input 
discrete memoryless channels under a low complexity successive 
cancellation decoding strategy. The original polar code construc- 
tion is closely related to the recursive construction of Reed- 
Muller codes and is based on the 2x2 matrix [ \ 1 1 . It was 
shown by Arikan and Telatar that this construction achieves an 
error exponent of |, i.e., that for sufficiently large blocklengths 
the error probability decays exponentially in the square root 
of the length. It was already mentioned by Arikan that in 
principle larger matrices can be used to construct polar codes. A 
fundamental question then is to see whether there exist matrices 
with exponent exceeding |. We first show that any £ x I matrix 
none of whose column permutations is upper triangular polarizes 
symmetric channels. We then characterize the exponent of a given 
square matrix and derive upper and lower bounds on achievable 
exponents. Using these bounds we show that there are no matrices 
of size less than 15 with exponents exceeding |. Further, we give 
a general construction based on BCH codes which for large n 
achieves exponents arbitrarily close to 1 and which exceeds \ 
for size 16. 



I. Introduction 

Polar codes, introduced by Arikan in [1], are the first 
provably capacity achieving codes for arbitrary symmetric 
binary-input discrete memoryless channels (B-DMC) with low 
encoding and decoding complexity. The polar code construc- 
tion is based on the following observation: Let 
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Apply the transform Gf n (where denotes the n th 

Kronecker power) to a block of N = 2™ bits and transmit 
the output through independent copies of a B-DMC W (see 
Figure 1). As n grows large, the channels seen by individual 
bits (suitably defined in [1]) start polarizing: they approach 
either a noiseless channel or a pure-noise channel, where 
the fraction of channels becoming noiseless is close to the 
symmetric mutual information I(W). 

It was conjectured in [1] that polarization is a general phe- 
nomenon, and is not restricted to the particular transformation 
Gf n . In this paper we first give a partial affirmation to this 
conjecture. In particular, we consider transformations of the 
form G 8 " where G is an I x £ matrix for £ > 3 and provide 
necessary and sufficient conditions for such Gs to polarize 
symmetric B-DMCs. 

For the matrix G2 it was shown by Arikan and Telatar [2] 
that the block error probability for polar coding and successive 
cancellation decoding is 0(2 -2 ) for any fixed f3 < i, where 
2™ is the blocklength. In this case we say that G2 has exponent 
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Fig. 1. The transform G® n is applied and the resulting vector is transmitted 
through the channel W . 



i. We show that this exponent can be improved by considering 
larger matrices. In fact, the exponent can be made arbitrarily 
close to 1 by increasing the size of the matrix G. 

Finally, we give an explicit construction of a family of 
matrices, derived from BCH codes, with exponent approaching 
1 for large £. This construction results in a matrix whose 
exponent exceeds | for I = 16. 

II. Preliminaries 

In this paper we deal exclusively with symmetric channels: 
Definition 1: A binary-input discrete memoryless channel 
(B-DMC) W : {0, 1} — ► y is said to be symmetric if there 
exists a permutation tt : y — > y such that tt = tt^ 1 and 
W(y\0) = W(n(y)\l) for all yey. 

Let W : {0, 1} — > y be a symmetric binary-input discrete 
memoryless channel (B-DMC). Let I(W) <G [0, 1] denote the 
mutual information between the input and output of W with 
uniform distribution on the inputs. Also, let Z(W) G [0, 1] 
denote the Bhattacharyya parameter of W, i.e., Z(W) = 

Fix an I > 3 and an £ x i invertible matrix G with 
entries in {0, 1}. Consider a random ^-vector U[ that is 
uniformly distributed over {0, 1} . Let Xf = UfG, where 
the multiplication is performed over GF(2). Also, let Y* be 
the output of I uses of W with the input Xf. The channel 
between U( and Yf is defined by the transition probabilities 

t i 

m(vi 1 u{) 4 u w( yi 1 x t ) = n w (v* 1 ( 2 ) 

i=l i=l 

Define : {0, 1} -> y e x {0, l}'" 1 as the channel with 

input Ui, output (yfjU^ 1 ) and transition probabilities 

W^iytui- 1 \ Ui ) = 2^1 E Wi(y{\u{), (3) 
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and let denote its Bhattacharyya parameter, i.e., 

Z®= £ ] /w^(y[,u\~ 1 \0)W^(ylu\- 1 \l). 

Vl-u i 

For k > 1 let M^ fc : {0, 1} -> J fe denote the B-DMC with 
transition probabilities 

w fe (^k)=n^i a; )- 

3=1 

Also let l^W : {0,1} -> ^ denote the B-DMC with 
transition probabilities 

^ w (vi I «0 = iE ^ I °i -1 '^)- (4) 

Observation 2: Since is symmetric, the channels WW 
and are equivalent in the sense that for any fixed u\ 
there exists a permutation n u i-i : y e ^ y e such that 

WW^,^- 1 \ui) = -L^w ( ^ rl( ^) 

Finally, let /W denote the mutual information between the 
input and output of channel W^ 1 - 1 . Since G is invertible, it is 
easy to check that 

e 

=£I(W). 

i=l 

We will use C to denote a linear code and dmin(C) to denote 
its minimum distance. We let (g±, . . . ,gk) denote the linear 
code generated by the vectors gi,...,gk- We let djj(a, b) 
denote the Hamming distance between binary vectors a and b. 
We also let dn{a, C) denote the minimum distance between a 
vector a and a code C, i.e., djj(a, C) = mm cec dn(a, c). 

III. Polarization 

We say that G is a polarizing matrix if there exists an i e 
{!,...,£} for which 

W^(y{\u i ) = Q(y A o)l[w(y j \u i ) (5) 

j£A 

for some and A C {1, . ..,£} with |A| = fc, k > 2, and a 
probability distribution Q : y^ A ^ — > [0, 1], 

In words, a matrix G is polarizing if there exists a bit which 
"sees" a channel whose k outputs are equivalent to those of 
k independent realizations of the underlying channel, whereas 
the remaining I — k outputs are independent of the input to the 
channel. The reason to call such a G "polarizing" is that, as we 
will see shortly, a repeated application of such a transformation 
polarizes the underlying channel. 

Recall that by assumption W is symmetric. Hence, by 
Observation 2, equation (5) implies 

W^(ylu\-i\u t ) = H W{{^{y{)) 3 \ Ul ), (6) 

jeA 

an equivalence we will denote by = W k . Note that 

= W k implies = I{W k ) and Z^> = Z(W k ). 



We start by claiming that any invertible {0, 1} matrix G 
can be written as a (real) sum G = P + P', where P is a 
permutation matrix, and P' is a {0, 1} matrix. To see this, 
consider a bipartite graph on 21 nodes. The I left nodes 
correspond to the rows of the matrix and the I right nodes 
correspond to the columns of the matrix. Connect left node 
i to right node j if Gij — 1. The invertibility of G implies 
that for every subset of rows 1Z the number of columns which 
contain non-zero elements in these rows is at least \1Z\. By 
Hall's Theorem [3, Theorem 16.4.] this guarantees that there 
is a matching between the left and the right nodes of the graph 
and this matching represents a permutation. Therefore, for any 
invertible matrix G, there exists a column permutation so that 
all diagonal elements of the permuted matrix are 1. Note that 
the transition probabilities defining are invariant (up to a 
permutation of the outputs yf) under column permutations on 
G. Therefore, for the remainder of this section, and without 
loss of generality, we assume that G has Is on its diagonal. 

The following lemma gives necessary and sufficient condi- 
tions for (5) to be satisfied. 

Lemma 3 (Channel Transformation for Polarizing Matrices): 
Let W be a symmetric B-DMC. 

(i) If G is not upper triangular, then there exists an i for 
which = W k for some k > 2. 

(ii) If G is upper triangular, then = W for all 1 < i < I. 
Proof: Let the number of Is in the last row of G be k. 

Clearly = W k . If k > 2 then G is not upper triangular 
and the first claim of the lemma holds. If k = 1 then 

Gi k = 0, for all 1 < k < £. (7) 

One can then write 

w^iyiui-^lm-t) 

"•I— 1+1 

= i E ?*[Yt r = y{- l \ul = u\] 
i—i 

u t _ i+1 ,ue 

.Pr[Y e =y e \Y^= y {-\ut = u{] 

z e-i 

.Vv[Yt=yt\Yt l =y{-\Ui = u[] 

^-iCyf-Vr 1 ) 

£-1 

■ Y,?AYi = yi\Yt x = y{-\ui = u{] 

= Jzi[W(ye\0) + W(yt\l)] £ WW^" 1 K" 1 ). 

L e-i 

u e— i+i 

Therefore, Y( is independent of the inputs to the channels 
») for i = 1, .,.,£— 1. This is equivalent to saying that 
channels . . . , W^~^ are defined by the matrix x \ 

where we define G^~ 1 ^ as the (£ — i) x (t — i) matrix obtained 
from G by removing its last i rows and columns. Applying 
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the same argument to G^~ x ' and repeating, we see that if G 
is upper triangular, then we have = W for all i. On the 
other hand, if G is not upper triangular, then there exists an i 
for which G^ l ~^ has at least two Is in the last row. This in 
turn implies that W^W = W k for some k > 2. ■ 
Consider the recursive channel combining operation given 
in [1], using a transformation G. Recall that n recursions of 
this construction is equivalent to applying the transformation 
A n G® n to Uf where, A n : {1, . . . , £ n } -> {1, . . . , £ n } is a 
permutation defined analogously to the bit-reversal operation 
in [1]. 

Theorem 4 (Polarization of Symmetric B-DMCs): Given a 
symmetric B-DMC W and an £ x £ transformation G, consider 
the channels W^\i = {1, . . . ,£ n }, defined by the transfor- 
mation A n G® n . 

(i) If G is polarizing, then for any 5 > 

lim \{ie {!,...,£"} :I(WW)e (6, 1-5)} \ = q 

n — ►oo £ n 

(8) 

Hm \{ie {!,..., £ n }:Z(W^)e(SA-5)}\ = q 

n — >oc £ n 

(9) 

(ii) If G is not polarizing, then for all n and i G {1, . . . , £ n } 

I(W®) = I(W), Z(W {l) ) = Z(W). 
In [1, Section 6], Arikan proves part (i) of Theorem 4 for 
G = G2- His proof involves defining a random variable W n 
that is uniformly distributed over the set {W^}Ci (where 
I = 2 for the case G — G2), which implies 

{!,...,£"} :I(W^)e(a,b)}\ 



Pr[/(W„) G (a, b)} 



Pr[Z(W„) G (a, 6)] 



(10) 

|{i G {!,..., f}:^W)6 (a, ft)}| 



(11) 



Following Arikan, we define the random variable W n G 
{WC}^ for our purpose through a tree process {W n ; n > 
0} with 

VK = W, 

where {B n ;n > 1} is a sequence of i.i.d. random variables 
defined on a probability space (fl,J-, fx), and where B n is 
uniformly distributed over the set {1, ...,£}. Defining Tq = 
{0, fi} and T n — o-(Bi, . . . , B n ) for n > 1, we augment the 
above process by the processes {/„; n > 0} := {I(W n )\ n > 
0} and {Z n ; n > 0} := {Z(W n );n > 0}. It is easy to verify 
that these processes satisfy (10) and (11). 

Observation 5: {(/„, JF„)} is a bounded martingale and 
therefore converges w.p. 1 and in C 1 to a random variable 

Lemma 6 (loo)-' If G is polarizing, then 

fi w.p. ion 

00 1 w.p. 1 - J(W). 



Proof: For any polarizing transformation G, Lemma 3 
implies that there exists an i G {1, . . . , £} and k > 2 for which 

J« = /(V^ fc ). (12) 

This implies that for the tree process defined above, we have 



1 



Tt + l 



I(W£) with probability at least -, 



for some k > 2. Moreover by the convergence in C 1 of I n , 
we have E[|/ n+ i — Z„|] n — 0. This in turn implies 



E[|/„+i - /„ 



> jE[(I(W^) I(W n )} 



0. 



(13) 



It is shown in Lemma 33 in the Appendix that for any 
symmetric B-DMC W n , if J(W„) G (S, 1-6) for some S > 0, 
then there exists an rj(S) > such that I(W*) - I(W n ) > 
r)(S). Therefore, convergence in (13) implies 1^ G {0, 1} w.p. 
1 . The claim on the probability distribution of 1^ follows from 
the fact that {!„} is a martingale, i.e., E^] = E[/ ] = I(W). 

■ 

Proof of Theorem 4: Note that for any n the fraction in (8) 
is equal to Pr[J„ G (S, 1 — 5)]. Combined with Lemma 6, this 
implies (8). 

For any B-DMC Q, I(Q) and Z(Q) satisfy [1] 

I(Q) 2 + Z(Q) 2 < 1, 

I(Q) + Z(Q) > 1. 

When I(Q) takes on the value or 1, these two inequalities 
imply that Z(Q) takes on the value 1 or 0, respectively. From 
Lemma 6 we know that {/„} converges to 1^ w.p. 1 and 
^00 G {0,1}. This implies that {Z n } converges w.p. 1 to a 
random variable Z^ and 



Zno — 



w.p. I(W), 

1 w.p. 1 - I(W). 



This proves the first part of the theorem. The second part 
follows from Lemma 3, (ii). ■ 

Remark 7: Arikan's proof for part (i) of Theorem 4 with 
G = G2 proceeds by first showing the convergence of {Z n }, 
instead of {/„}. This is accomplished by showing that for 
the matrix G2 the resulting process {Z n } is a submartingale. 
Such a property is in general difficult to prove for arbitrary G. 
On the other hand, the process {/„} is a martingale for any 
invertible matrix G, which is sufficient to ensure convergence. 

Theorem 4 guarantees that repeated application of a po- 
larizing matrix G polarizes the underlying channel W, i.e., 
the resulting channels W^ l \ i G {1, ...,£"}, tend towards 
either a noiseless or a completely noisy channel. Lemma 6 
ensures that the fraction of noiseless channels is indeed I(W). 
This suggests to use the noiseless channels for transmitting 
information while transmitting no information over the noisy 
channels [1]. Let A C {1, . . . , £ n } denote the set of channels 
W"' used for transmitting the information bits. Since 
upper bounds the error probability of decoding bit Ui with 
the knowledge of C/^ -1 , the block error probability of such 
a transmission scheme under successive cancellation decoder 
can be upper bounded as [1] 



(14) 
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Further, the block error probability can also be lower bounded 
in terms of the Z^'s: Consider a symmetric B-DMC with 
Bhattacharyya parameter Z, and let P e denote the bit error 
probability of uncoded transmission over this channel. It is 
known that 

Pe > \(l - Vl-Z*). 

A proof of this fact is provided in the Appendix. Under 
successive cancellation decoding, the block error probability 
is lower bounded by each of the bit error probabilities over 
the channels Therefore the former quantity can be lower 
bounded by 



P B >maxi(l-^l-(ZM)2). 



(15) 



Both the above upper and lower bounds to the block error 
probability look somewhat loose at a first look. However, as 
we shall see later, these bounds are sufficiently tight for our 
purposes. Therefore, it suffices to analyze the behavior of the 

IV. Rate of Polarization 

For the matrix G 2 Arikan shows that, combined with suc- 
cessive cancellation decoding, these codes achieve a vanishing 
block error probability for any rate strictly less than I(W). 
Moreover, it is shown in [2] that when Z n approaches it 
does so at a sufficiently fast rate: 

Theorem 8 ([2]): Given a B-DMC W, the matrix G 2 and 
any j3 < |, 

lim Vr[Z n < 2- 2 " P } = I(W). 

n — >oo 

A similar result for arbitrary G is given in the following 
theorem. 

Theorem 9 (Universal Bound on Rate of Polarization): 
Given a symmetric B-DMC W, an £ x £ polarizing matrix G, 
and any (3 < 2 , 

lim Pr[Z„ < T rfi \ = I(W). 

n — >oo 

Proof Idea: For any polarizing matrix it can be shown that 
Z n +\ < £Z n with probability 1 and that Z n+ \ < Z% with 
probability at least l/£. The proof then follows by adapting 
the proof of [2, Theorem 3]. ■ 

The above estimation of the probability is universal and is 
independent of the exact structure of G. We are now interested 
in a more precise estimate of this probability. The results in 
this section are the natural generalization of those in [2]. 

Definition 10 (Rate of Polarization): For any B-DMC W 
with < I(W) < 1, we will say that an I x I matrix G 
has rate of polarization E(G) if 

(i) For any fixed [3 < E(G), 

liminf Pr[Z„ < 2~ rP ] = I(W). 

n—>oo 

(ii) For any fixed [3 > E(G), 

liminfPr[Z„ > T eli \ = 1. 

n — >oc 

For convenience, in the rest of the paper we refer to E(G) as 
the exponent of the matrix G. 



The definition of exponent provides a meaningful perfor- 
mance measure of polar codes under successive cancellation 
decoding. This can be seen as follows: Consider a matrix G 
with exponent E(G). Fix < R < I{W) and (3 < E(G). 
Definition 10 (i) implies that for n sufficiently large there 
exists a set A of size £ n R such that zZteA 2 ^ < 2~^". 
Using set A as the set of information bits, the block error 
probability under successive cancellation decoding Pb can be 
bounded using (14) as 

Pb<2~^. 

Conversely, consider R > and j3 > E(G). Definition 10 (ii) 
implies that for n sufficiently large, any set A of size £ n R 
will satisfy maxj g ^ 

Z (i) > 2 -e nl \ Using (15 ) the bk)ck 

error 

probability can be lower bounded as 



Pb > 2" 



f n/3 



It turns out, and it will be shown later, that the exponent 
is independent of the channel W. Indeed, we will show in 
Theorem 14 that the exponent E(G) can be expressed as a 
function of the partial distances of G. 

Definition 11 (Partial Distances): Given an £ x £ matrix 
G = [gf , . . . , gJ] T , we define the partial distances Di, 
i = 1 , . . . , i as 



1,. 



A = d H (gi, (gi+i, 

D e ^d H (ge,0). 
Example 12: The partial distances of the matrix 



1. 



F 



1 
1 1 
1 1 1 



are D l = l,D 2 = 1,D 3 = 3. 

In order to establish the relationship between E(G) and the 
partial distances of G we consider the Bhattacharyya parame- 
ters Z^' of the channels These parameters depend on G 
as well as on W. The exact relationship with respect to W is 
difficult to compute in general. However, there are sufficiently 
tight upper and lower bounds on the Z^s in terms of Z(W), 
the Battacharyya parameter of W . 

Lemma 13 (Bhattacharyya Parameter and Partial Distance): 
For any symmetric B-DMC W and any £ x £ matrix G with 
partial distances {Di}f =1 



Z{W) D > < Z® < 2 t - l Z{W) D \ 
Proof: To prove the upper bound we write 



(16) 



(3) _\_ x - 

t i — l 

/ W t (yi\u\-\oy i+1 )W t (y{\u\- 1 ,l,wi + i) 



e i—l P P 

y{,u 1 v l i+1 ,w e i+1 
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.^(^k~\i,<i)- 



(17) 

Let c = (i4 -1 ,0,vf +1 )G and c\ = (u* -1 , 1, wf +1 )G. Let 
So (Si) be the set of indices where both Co and c\ are equal 
to 0(1). Let S c be the complement of So U Si. We have 

|S C | = d H (c ,ci) > Di. 

Now, (17) can be rewritten as 

zil) ^^r E E n^foi )!!^-! 1 ) 



< 



2 e-i 



E ^' 



/ ^ i— 



For the lower bound on Z^, first note that by Observation 
2, we have Z(WW) = Z(W^). Therefore it suffices to show 
the claim for the channel W^K Let G = [gf, . . . , gf] T . Then 
using (2), (3) and (4), W^ % ' can be written as 

1 1 
W^{v[\u i ) = wri lW(y k \x k ) (18) 



where e -4(ui) C {0, l} f if and only if for some it| +1 e 
{0,1}'"* 



Ut9i 



E 



(19) 



j=i+i 



Consider the code (<?j+i, . . ■ ,ge) and let Ylj=i+i a j9j ^ e a 

codeword satisfying dH(gi,Yfj=i+i a j9j) = -^i- Due * me 
linearity of the code (<?i+i ...,gi), one can equivalently say 
that € -4(iti) if and only if 



udgi 



j=i+l 



*j9j) 



H9r 



(20) 



j=i+i 



9i + y., .. . <>. <//.. and g ' = 

9i-i,9i ,9i+i gf] T - Equations (19) and (20) 



Now let g[ = 

I9i ,---,9i-i,9i ,9i+i,-- : ,9e\ 
show that the channels defined by the matrices G and G' 
are equivalent. Note that G 1 has the property that the Hamming 
weight of gt is equal to Di. 

(i) 

We will now consider a channel Wg where a genie pro- 
vides extra information to the decoder. Since W^' is degraded 

(i) 

with respect to the genie-aided channel Wg , and since the 
ordering of the Bhattacharyya parameter is preserved under 
degradation, it suffices to find a genie-aided channel for which 



Z(W) 



D, 



Consider a genie which reveals the bits u i+1 to the decoder 
(Figure 2). With the knowledge of u l i+l the decoder's task 
reduces to finding the value of any of the transmitted bits 
Xj for which gij = 1. Since each bit Xj goes through an 




(0 



Fig. 2. Genie-aided channel W, 



independent copy of W, and since the weight of gi is equal to 
the resulting channel Wg is equivalent to Di independent 



copies of W. Hence, Z, 



(0 



Z{W) 



D, 



Lemma 13 shows that the link between Z^> and Z(W) is 
given in terms of the partial distances of G. This link is 
sufficiently strong to completely characterize E(G). 

Theorem 14 (Exponent from Partial Distances): For any 
symmetric B-DMC W and any I x I matrix G with partial 
distances {-Dj}| =1 , the rate of polarization E(G) is given by 



E(G) 



7E lo & D - 



(21) 



Proof: The proof is similar to that of [2, Theorem 3]. 
We highlight the main idea and omit the details. 

First note that by Lemma 13 we have Zj > Let 
in i = |{1 < j < n : Bj = i}\. We then obtain 



Z n > Z n < D '^ = z 



(22) 



The exponent of Z on the right-hand side of (22) can be 
rewritten as 

By the law of large numbers, for any e > 0, 

1 



rrij 

77 



< e 



with high probability for 7i sufficiently large. This proves part 
(ii) of the definition of E(G), i.e., for any \3 > \ J2i Di, 

lim Pr[Z n > 2' rfi \ = 1. 



The proof for part (i) of the definition follows using similar 
arguments as above, and by noting that Zj < 2 3 Z^_( . The 
constant 2 £ ~ Bj can be taken care of using the 'bootstrapping' 
argument of [2]. ■ 
Example 15: For the matrix F considered in Example 12, 
we have 

E(F) = -(logs 1 + log 3 1 + log 3 3) = i. 

V. Bounds on the Exponent 

For the matrix G2, we have E(G2) = \- Note that for the 
case of 2 x 2 matrices, the only polarizing matrix is G2. In 
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order to address the question of whether the rate of polarization 
can be improved by considering large matrices, we define 



Ee = max E(G). 

G€{0,1}' X( 



(23) 



where the last equality follows from 

dH(9k,c + g k +i) > 

= D k > D k+1 



mm 

c6(9fc+2,-. 



mm 

c£(Sfc+i,9fc + 2, 



,9e) 



dii(gk,c) 



Theorem 14 facilitates the computation of Et by providing an 
expression for E(G) in terms of the partial distances of G. 
Lemmas 16 and 18 below provide further simplification for 
computing (23). 

Lemma 16 (Gilbert-Varshatnov Inequality for Linear Codes): 
Let C be a binary linear code of length £ and dmin(C) = di. 
Let g G {0, l} e and let d H (g,C) = d 2 . Let c' be the linear 
code obtained by adding the vector g to C, i.e., C = (g,C). 
Then dmin(C ) = min{di, d 2 }. 

Proof: Since C is a linear code, its codewords are of the 
form c + ag where c <G C, a G {0, 1}. Therefore 

dmin(C ) = mm{min{df/(0, c), dn(0, c + <?)}} 

= min{min{d ff (0, c)}, mm{d H (g, c)}} 



Therefore, D' k D' k+1 > D k D k+ i, which proves the first claim. 
The second claim follows from the inequality D' k+1 > D k > 



D 



k+l 



Di. 



= min{di, d 2 }. 



Corollary 17: Given a set of vectors gi, 



distances Di 



dnigj, (gj- 



,9k)), j 



, gk with partial 
= 1, .. . ,k, the 



minimum distance of the linear code (gi, . . . , g k ) is given by 

mhrj =1 {D.,}. 

The maximization problem in (23) is not feasible in practice 
even for I > 10. The following lemma allows to restrict this 
maximization to a smaller set of matrices. Even though the 
maximization problem still remains intractable, by working 
on this restricted set, we obtain lower and upper bounds on 
E e . 

Lemma 18 (Partial Distances Should Decrease): Let G = 
[gf...gf} T . Fix k G {!,...,£} and let G' = 
[gf . . . g k+ ig k ■ ■ ■ 9iY be the matrix obtained from G by 
swapping g k and gk+i- Let {Z?.J^ =1 and {D^ =1 denote the 
partial distances of G and G' respectively. If D k > D k +i, 
then 

(i) E(G') > E(G), 

(ii) D' k+1 > D' k . 

Proof: Note first that D. t — D\ if i $ {ft, k + 1}. 
Therefore, to prove the first claim, it suffices to show that 
D' k D' k+1 > D k D k+1 . To that end, write 



D 'k = d H(g k +i, (g k ,g k +2, 



D k = d H {g k , (g k +i, ■ ■ 
D' k+1 = d H (g k , (g k +2, ■ ■ 
D k +i = dff(gk+i, (fffc+2, 



,9i)), 
9i)), 
.%>)> 



and observe that D' k+1 > D k since (g k +2, • • ■ 
code of (g k +i, . . . ,ge). D k can be computed as 



is a sub- 



min d H {gk+i,c), min d H (g k+1 , c + g k ) 

^c£(g k+2 ,...,gi) ce(gk+2,—,3i) 

min{L> fe+ i, min d H (gk, c + 9k+x)} 
ce(sfc +2 , —,gi) 



D 



Corollary 19: In the definition of Eg (23), the maximization 
can be restricted to the matrices G which satisfy D\ < D 2 < 
...<D e . 

A. Lower Bound 

The following lemma provides a lower bound on Eg by using 
a Gilbert- Varshamov type construction. 
Lemma 20 (Gilbert-Varshatnov Bound): 



1 



where 



D-l 



Di 



3=0 



< T 



(24) 



Proof: We will construct a matrix G = [gf, . . . ,gf] T , 
with partial distances Di = Dc Let S(c,d) denote the set of 
binary vectors with Hamming distance at most d from c G 
{0,1} £ , i.e., 

S(c, d) = {xe {0, l} e ■ d H (x, c) < d}. 

To construct the i th row of G with partial distance Di, we 
will find a v G {0, 1} £ satisfying d H (v, (gi+i,. ■ ■ ,gt)) = Di 
and set gi = v. Such a v satisfies v ^ S(c,Di — 1) for all 
c G (gt+i, ...,gi) and exists if the sets S(c,Di — 1), c G 
(gi+i, . ■ ■ , gi) do not cover {0, l} £ . The latter condition is 
satisfied if 



U, 



t) S{c,Di-l)\ < 



E 

ce(g i+ i,.. 



\S(c,Di 



1) 



D»-l 
3=0 



,9l) 



<2 £ 



which is guaranteed by (24). ■ 
The solid line in Figure 3 shows the lower bound of Lemma 
20 . The bound exceeds | for I — 85, suggesting that the 
exponent can be improved by considering large matrices. In 
fact, the lower bound tends to 1 when I tends to infinity: 

Lemma 21 (Exponent 1 is Achievable): lim^oo Eg = 1. 
Proof: Fix a G (0, ±). Let {Dj} be defined as in Lemma 
20. It is known (cite something here) that Dr a n in (24) 
satisfies lirm^oo Dr a n > lh^ 1 (a), where h(-) is the binary 
entropy function. Therefore, there exists an £o(a) < oo such 
that for all I > £ (a) we have D^^ > ^£h^ 1 (a). Hence, for 

> £o(a) we can write 



I 1 



k+l, 



i=[a£ 
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Fig. 3. The solid curve shows the lower bound on Eg as described by 
Lemma 20. The dashed curve corresponds to the upper bound on Eg according 
to Lemma 26. The points show the performance of the best matrices obtained 
by the procedure described in Section VI. 



> -(!-«)£ log, ±-L 

= l-a + (l-a)lo& — 

where the first inequality follows from Lemma 20, and the 
second inequality follows from the fact that Di < Di + i for 
all i. Therefore we obtain 



liminf Ee > 1 — a Va £ (0, — ). 

l-HX> 2 



(25) 



Also, since £)j < £ for all i, we have < 1 for all £. Hence, 
limsupE^ < 1. (26) 

Combining (25) and (26) concludes the proof. ■ 

B. Upper Bound 

Corollary 19 says that for any £, there exists a matrix with 
D\ < ■ ■ ■ < Di that achieves the exponent Eg. Therefore, to 
obtain upper bounds on Ei, it suffices to bound the exponent 
achievable by this restricted class of matrices. The partial 
distances of these matrices can be bounded easily as shown 
in the following lemma. 

Lemma 22 (Upper Bound on Exponent): Let d(n,k) de- 
note the largest possible minimum distance of a binary code 
of length n and dimension k. Then, 



I 1 

E £ < 7 ^log^d(^-i + l). 



I 

Proof: Let G be an £ x £ matrix with partial distances 
{DiYi—i such that E(G) = E(. Corollary 19 lets us assume 
without loss of generality that Di < D- l+ i for all i. We 
therefore obtain 

D t = min Dj = dmin((<7i, . . . , gg}) < d(£, I — i + 1), 

where the second equality follows from Corollary 17. ■ 
Lemma 22 allows us to use existing bounds on the minimum 
distances of binary codes to bound E^: 



Example 23 (Sphere Packing Bound): Applying the sphere 
packing bound for d(£, £ — i + 1) in Lemma 22, we get 



1 1 



(27) 



where 



i=i 



Dj = max 



< 2 l 



Note that for small values of n for which d(n, k) is known for 
all k < n, the bound in Lemma 22 can be evaluated exactly. 

C. Improved Upper Bound 

Bounds given in Section V-B relate the partial distances 
{Di} to minimum distances of linear codes, but are loose 
since they do not exploit the dependence among the {Di}. 
In order to improve the upper bound we use the following 
parametrization: Consider an I x I matrix G = [gf, . 
Let 

Ti = {k: g ik = 1, g jk = for all j > i} 
Si = {k :3j > i s.t. g jk = 1}, 

and let t t = |Tj|. 

Example 24: For the matrix 



,9j] T . 



F 



T 2 = {3} and S 2 = {1,2}. 

Note that Ti are disjoint and Si = U £ j =i+1 Tj. Therefore, = 
Yfj=i+i ti- Denoting the restriction of gj to the indices in Si 
by gjSi, we have 



Di = ti + Si 



(28) 



where s, = d H {giSi, {9{i+\)Sv ■ ■ -,9iSi))- By a similar rea- 
soning as in the proof of Lemma 18, it can be shown that 
there exists a matrix G with 



and 



Si < dH(gjSi,{9(j+i)Si,- ■ -,9iSi)) Vi < j, 



E(G) = E £ . 



Therefore, for such a matrix G, we have (cf. proof of Lemma 
22) 



Si<d(\Si\,£-i + l). 



(29) 



Using the structure of the set Si, we can bound Sj further: 
Lemma 25 (Bound on Sub-distances): S{ < L^I^J- 

Proof: We will find a linear combination of 
{g(i+i)S t i •••) 9iSi} whose Hamming distance to is at 
most L^J- To this end define w = 2~2 £ j=i+i a j9jSi' where 
ctj <G {0,1}. Also define w k = 2~2j=i+i a j9jSi- Noting 
that the sets T^-s are disjoint with U^ =i+1 Tj = Si, we have 
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We now claim that choosing the 
Oii+i, ■ ■ ■ , on by 



,-s in the order 



argmiri^.g^ ^dH^ , w 3 - 1T] + ajg jTj ), (30) 

we obtain du(gis i ,w) < J. To see this, note that 
by definition of the sets Tj we have wt- = wjt- ■ Also 
observe that by the rule (30) for choosing ctj, we have 

dH{giT v w jTj ) < L^rJ- Thus ' 

l 

dH(giSi,w)= 2J d H (giT j ,w Tj ) 

j=i+l 
£ 

= dfl (giTj j ^UjTj ) 

3=t+l 



I 

j=i+l 



< 



\Si 



Combining (28), (29) and Lemma 25, and noting that the 
invertibility of G implies £ U = ^> we obtain the following: 
Lemma 26 (Improved Upper Bound): 



Ei < max 



1 1 



where 



1 

Si =min{L- J! ^J> rf (£ *3^-« + l)}. 
j=i+l j=t+l 
The bound given in the above lemma is plotted in Figure 3. 

It is seen that no matrix with exponent greater than | can be 
found for I < 10. 

In addition to providing an upper bound to Ee, Lemma 
26 narrows down the search for matrices which achieve Eg. 
In particular, it enables us to list all sets of possible partial 
distances with exponents greater than i. For 11 < I < 14, 
an exhaustive search for matrices with a "good" set of partial 
distances bounded by Lemma 26 (of which there are 285) 
shows that no matrix with exponent greater than \ exists. 

VI. Construction Using BCH Codes 

We will now show how to construct a matrix G of dimension 
i = 16 with exponent exceeding |. In fact, we will show how 
to construct the best such matrix. More generally, we will show 
how BCH codes give rise to "good matrices." Our construction 
of G consists of taking an £ x I binary matrix whose k last rows 
form a generator matrix of a fc-dimensional BCH code. The 
partial distance Dk is then at least as large as the minimum 
distance of this /c-dimensional code. 

To describe the partial distances explicitly we make use 
of the spectral view of BCH codes as sub-field sub-codes 
of Reed-Solomon codes as described in [4]. We restrict our 
discussion to BCH codes of length £ = 2 m — 1, m £ N. 

Fix m£N. Partition the set of integers {0, 1, ... , 2 TO - 2} 
into a set C of chords, 

C = Uf2 2 {2 k i mod (2 m - 1) : k £ N}. 



Example 27 (Chords for m — 5): For m = 5 the list of 
chords is given by 

C = {{0}, {1,2, 4, 8, 16}, {3, 6, 12, 17, 24}, 
{5, 9, 10, 18, 20}, {7, 14, 19, 25, 28}, 
{11, 13, 21, 22, 26}, {15, 23, 27, 29, 30}}. 

■ 

Let C denote the number of chords and assume that the 
chords are ordered according to their smallest element as in 
Example 27. Let p(i) denote the minimal element of chord 
i, 1 < i < C and let l(i) denote the number of elements in 
chord i. Note that by this convention p(i) is increasing. It is 
well known that 1 < l(i) < m and that l(i) must divide m. 

Example 28 (Chords for m = 5): In Example 27 we have 
C = 7, 1(1) = 1, 1(2) = ■■■ = 1(7) = 5 = m, ^(1) = 0, 
H(2) = 1, /u(3) = 3, n(A) = 5, /Lt(5) = 7, p(6) = 11, fj,(7) = 
15. ■ 
Consider a BCH code of length £ and dimension 2~2j=k Kj) f° r 
some k £ {1, . . . , C}. It is well-known that this code has mini- 
mum distance at least p(k) + l. Further, the generator matrix of 
this code is obtained by concatenating the generator matrices 
of two BCH codes of respective dimensions Y^j=k+i Kj) an d 
l(k). This being true for all k £ {1,...,C}, it is easy to 
see that the generator matrix of the £ dimensional (i.e., rate 1) 
BCH code, which will be the basis of our construction, has the 
property that its last Y^j=k rows form the generator matrix 
of a BCH code with minimum distance at least p(k) + 1. This 
translates to the following lower bound on partial distances 
{Di}: Clearly, Di is least as large as the minimum distance 
of the code generated by the last £ — i + 1 rows of the matrix. 
Therefore, if E$L*+i l U) <£-i + l< £? =fc l(j), then 

A > + 1- 

The exponent E associated with these partial design distances 
can then be bounded as 

c 



E > 



2 m - 



(31) 



— log 31 (2-4-6-8-12-16) 



0.526433. 



Example 29 (BCH Construction for £ = 31): From the list 
of chords computed in Example 27 we obtain 

5 

31 

An explicit check of the partial distances reveals that the above 
inequality is in fact an equality. ■ 
For large m, the bound in (31) is not convenient to work 
with. The asymptotic behavior of the exponent is however 
easy to assess by considering the following bound. Note 
that no fj,(i) (except for i = 1) can be an even number 
since otherwise p(i)/2, being an integer, would be contained 
in chord i, a contradiction. It follows that for the smallest 
exponent all chords (except chord 1) must be of length m and 
that fj,(i) = 2i + 1. This gives rise to the bound 

1 



E > 



(2 m - l)log(2™ - 1) 

• (5>log(2fc) + (2' 

\fc=i 



(32) 



2 - am) log(2a + 2) 
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where a = [ 2 w ~ 2 j . It is easy to see that as m — > oo the above 
exponent tends to 1, the best exponent one can hope for (cf. 
Lemma 21). We have also seen in Example 29 that for m = 5 
we achieve an exponent strictly above \. 

Binary BCH codes exist for lengths of the form 2 m — 1, 
To construct matrices of other lengths, we use shortening, a 
standard method to construct good codes of smaller lengths 
from an existing code, which we recall here: Given a code C, 
fix a symbol, say the first one, and divide the codewords into 
two sets of equal size depending on whether the first symbol 
is a 1 or a 0. Choose the set having zero in the first symbol 
and delete this symbol. The resulting codewords form a linear 
code with both the length and dimension decreased by one. 
The minimum distance of the resulting code is at least as large 
as the initial distance. The generator matrix of the resulting 
code can be obtained from the original generator matrix by 
removing a generator vector having a one in the first symbol, 
adding this vector to all the remaining vectors starting with a 
one and removing the first column. 

Now consider an I x I matrix Gi. Find the column j with 
the longest run of zeros at the bottom, and let i be the last 
row with a 1 in this column. Then add the ith row to all the 
rows with a 1 in the jth column. Finally, remove the ith row 
and the jth column to obtain an (I— 1) x (£ — 1) matrix Gt-\. 
The matrix Gt-\ satisfies the following property. 

Lemma 30 (Partial Distances after Shortening): Let the 
partial distances of Ge be given by {D\ < ■ ■ ■ < De}. Let 
Gi-i be the resulting matrix obtained by applying the above 
shortening procedure with the ith row and the jth column. 
Let the partial distances of Gt-\ be {D[, . . . , D^ ^}. We 
have 

D' k >D k , l<k<i-l (33) 
D' k = D k+1 , i<k<£-l. (34) 
Proof: Let Ge = [gf , . . . , gf] T and Gt-i = 
[g[ T , ■ ■ ■ , 3^_ 1 T ] T For i < k, g' k is obtained by removing 
the jth column of g k +i- Since all these rows have a zero in 
the jth position their partial distances do not change, which 
in turn implies (34). 

For k < i, note that the minimum distance of the code C = 
(g' k , ■ ■ ■ :9e-i) is obtained by shortening C = (g k , . . . , g t ). 
Therefore, D' k > dmin(c') > dmin(C) = D k . ■ 
Example 31 (Shortening of Code): Consider the matrix 

"10 10 1" 
10 1 
10 1 . 
1 1 
110 11 

The partial distances of this matrix are {1,2,2,2,4}. Accord- 
ing to our procedure, we pick the 3rd column since it has a 
run of three zeros at the bottom (which is maximal). We then 
add the second row to the first row (since it also has a 1 in 
the third column). Finally, deleting column 3 and row 2 we 



obtain the matrix 

" 1 " 
10 1 
11' 
1111 

The partial distances of this matrix are {1, 2, 2, 4}. ■ 

Example 32 (Construction of Code with £ = 16): Starting 
with the 31 x 31 BCH matrix and repeatedly applying the 
above procedure results in the exponents listed in Table I. 



1 


exponent 


i 


exponent 


e 


exponent 


i 


exponent 


31 


0.52643 


27 


0.50836 


23 


0.50071 


19 


0.48742 


30 


0.52205 


26 


0.50470 


22 


0.49445 


18 


0.48968 


29 


0.51710 


25 


0.50040 


21 


0.48705 


17 


0.49175 


28 


0.51457 


24 


0.50445 


20 


0.49659 


16 


0.51828 



TABLE I 

The best exponents achieved by shortening the BCH matrix of 

LENGTH 31. 



The 16 x 16 matrix having an exponent 0.51828 is 
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1 



The partial distances of this matrix are 
{16,8,8,8,8,6,6,4,4,4,4,2,2,2,2,1}. Using Lemma 26 
we observe that for the 16 x 16 case there are only 11 other 
possible sets of partial distances which have a better exponent 
than the above matrix. An exhaustive search for matrices with 
such sets of partial distances confirms that no such matrix 
exists. Hence, the above matrix achieves the best possible 
exponent among all 16 x 16 matrices. ■ 
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Appendix 

In this section we prove the following lemma which is used 
in the proof of Lemma 6. 
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Lemma 33 (Mutual Information of W ): Let W be a sym- 
metric B-DMC and let W k denote the channel 



W h (y'l\x)=l[W(y i \x). 



If I{W) G (8, 1 — (5) for some 8 > 0, then there exists an 

77(6) > such that I(W k ) - I(W) > r)(8). 

The proof of Lemma 33 is in turn based on the following 

theorem. 

Theorem 34 ([5], [6] Extremes of Information Combining): 
Let W\ , . . . , Wk be k symmetric B-DMCs with capacities 
Ji , . . . , Ik respectively. Let denote the channel with 

transition probabilities 



W^(y^\x) = l[W i (y i \x). 



i=l 



(k) 

Also let Wg S c denote the channel with transition probabil- 
ities 



Urn 



BSC(ei) 



1=1 



where BSC(ei) denotes the binary symmetric channel (BSC) 



with crossover probability G [0, \], Ci 



Pe(W) > |(1 - ^l-Z{Wf). 

Proof One can check that the inequality is satisfied with 
equality for BSC. It is also known that any symmetric B- 
DMC W is equivalent to a convex combination of several, say 



K, BSCs where the receiver has knowledge of the particular 
BSC being used. Let {e,}f£ 1 and {Zi\f =l denote the bit 
error probabilities and the Bhattacharyya parameter of the 
constituent BSCs. Then, P e (W) and Z(W) are given by 



K 



K 



i=l i=l 

for some ati > 0, with Y^i=i a i = 1- Therefore, 

p e (w) = J2^-^-z?) 



i=l 



>-d 



\ 



K 



1-(5>Z,) 2 ) 



i=l 



1 



where h denotes the binary entropy function. Then, I(W^ k ') > [2] 

Remark 35: Consider the transmission of a single bit X 
using k independent symmetric B-DMCs W\ , Wk with 
capacities I±, . . . , Ik- Theorem 34 states that over the class of 
all symmetric channels with given mutual informations, the 
mutual information between the input and the output vector is 
minimized when each of the individual channels is a BSC. 

Proof of Lemma 33: Let e G [0, |] be the crossover proba- 
bility of a BSC with capacity I{W), i.e., e = h' 1 ^ - I{W)). 
Note that for k > 2, 

I(W k ) > I(W 2 ) > I(W). 

By Theorem 34, we have I(W 2 ) > I {W 2 sc{e) ) . A simple 
computation shows that 

h(2ee) - 2h(e). 

We can then write 

I(W k ) - I(W) > I(W 2 sc(e) ) - I(W) 

= i(w^ c{() )-i(w BSC{e) ) 

= h(2ee) - h(e). (35) 

Note that I(W) G (8,1- 8) implies e G (<j)(S), \ - <t>(8)) 
where <p(8) > 0, which in turn implies h(2ee) — h(e) > rj(8) 
for some r](8) > 0. ■ 
Lemma 36: Consider a symmetric B-DMC W. Let P e (W) 
denote the bit error probability of uncoded transmission under 
MAP decoding. Then, 

1 



= -(i_vi- z(wn 

where the inequality follows from the convexity of the function 
x -»■ 1 - Vl - x 2 for x G (0, 1). ■ 
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